Decrypting Allostery in Membrane-Bound K-Ras4B Using Complementary In Silico Approaches Based on Unbiased Molecular Dynamics Simulations

Protein functions are dynamically regulated by allostery, which enables conformational communication even between faraway residues, and expresses itself in many forms, akin to different “languages”: allosteric control pathways predominating in an unperturbed protein are often unintuitively reshaped whenever biochemical perturbations arise (e.g., mutations). To accurately model allostery, unbiased molecular dynamics (MD) simulations require integration with a reliable method able to, e.g., detect incipient allosteric changes or likely perturbation pathways; this is because allostery can operate at longer time scales than those accessible by plain MD. Such methods are typically applied singularly, but we here argue their joint application—as a “multilingual” approach—could work significantly better. We successfully prove this through unbiased MD simulations (∼100 μs) of the widely studied, allosterically active oncotarget K-Ras4B, solvated and embedded in a phospholipid membrane, from which we decrypt allostery using four showcase “languages”: Distance Fluctuation analysis and the Shortest Path Map capture allosteric hotspots at equilibrium; Anisotropic Thermal Diffusion and Dynamical Non-Equilibrium MD simulations assess perturbations upon, respectively, either superheating or hydrolyzing the GTP that oncogenically activates K-Ras4B. Chosen “languages” work synergistically, providing an articulate, mutually coherent, experimentally consistent picture of K-Ras4B allostery, whereby distinct traits emerge at equilibrium and upon GTP cleavage. At equilibrium, combined evidence confirms prominent allosteric communication from the membrane-embedded hypervariable region, through a hub comprising helix α5 and sheet β5, and up to the active site, encompassing allosteric “switches” I and II (marginally), and two proposed pockets. Upon GTP cleavage, allosteric perturbations mostly accumulate on the switches and documented interfaces.

Starting Structure for Equilibrium MD and View of Key K-Ras4B Complexes The G domain (residues 2-166; yellow); GTP (sticks); and Mg 2+ (dark green sphere) are modeled after PDB ID: 6vjj. 1 The hypervariable region (HVR; residues 167-185; salmon) and the POPS/POPC phospholipid bilayer (lines) in which it is embedded through farnesylated Cys185 (C as large salmon spheres) are modeled from previous simulations. 2Solute ions Na + and Cl -are rendered as purple and pale green spheres, respectively.(b) View from a different angle to reveal the location of the K-Ras4B-GAP interface (GAP in pale green; PDB ID: 1wq1), 3 showing all salient regions of the former's G-domain with the same color code as Figure 1.A catalytically crucial arginine administered by GAP (Arg789) is labeled and rendered as sticks and shown alongside Lys16 and Gln61; Na + and Cl -are omitted.(c) View from yet another angle to reveal the location of the K-Ras4B-RAF1 effector interface (cysteine-rich domain of RAF1; pale blue; residues rendered as lines; PDB ID: 6xi7). 1 Color code otherwise identical to Figure 1.Asp153 on helix α5 of K-Ras4B and other representative residues common to several effector interfaces are also rendered as sticks and/or labeled.Na + and Cl -are omitted.In all panels, H and solvent are omitted for clarity.Color code for explicit non-C atoms/ions: Mg 2+ : dark green; Cl: pale green; Na: purple; P: orange; O: red; N: blue; S: yellow.

MD Preproduction Details
Run input files for all preproduction stages are provided electronically.Unless explicitly mentioned, parameters employed are either set to GROMACS defaults, 4 or identically to the MD production stage (see main text).

Minimization
Starting from the equilibrated, solvated, membrane-embedded K-Ras4B structure set up as discussed in the main text, all equilibrium MD replicas were begun by a round of steepest-descent structural minimization stopping once all atomic forces dropped below a 1000 kJ mol -1 nm -1 threshold to machine precision (usually taking about 2000 steps).No constraints were present at this stage, including on bonds containing hydrogens.In addition, the cutoff for the calculation of Lennard-Jones and Coulomb interactions was 8 Å rather than the 12 Å used at later stages.Beyond this limit, Coulomb interactions were computed in reciprocal space, with the Particle Mesh Ewald method 5 rather than in direct space, and Lennard-Jones interactions were set to zero (at this stage with no corrections).

ps NVT Equilibration
Following minimization, all replicas were subjected to an initial 100 ps round of equilibration in the NVT ensemble (T = 300 K): it was at this stage that atomic velocities for the entire system (a different set for each replica) were randomly assigned to match a Boltzmann distribution.Compared to the production stage, the timestep for the leap-frog integrator 6 was shorter, at 1 fs: to compensate, neighboring atom lists were made to update every 20 integrator steps rather than every 10.For this (and the next) equilibration stage, we imposed harmonic restraints (k = 1000 kJ mol -1 Å -2 in all directions) on protein backbone heavy atoms, Mg 2+ cation, all GTP atoms, and all lipid heavy atoms.
Temperature enforcement conditions and couplings were identical to those used in the production stage.The cutoff for the computation of Lennard-Jones and Coulomb interactions was lengthened to 12 Å from this stage onwards; energy and pressure corrections to discrepancies arising from Lennard-Jones truncation were henceforth applied beyond this cutoff.Also from this stage onwards, LINCS 7 (non-water) and SETTLE 8 (water) constraints were introduced for all hydrogen-containing bonds.

ps NpT Equilibration
After NVT equilibration, there followed a further 100 ps of equilibration, this time in the NpT ensemble (T = 300 K; p = 1 bar), with all previous atomic restraints retained.Contrarily to the production stage, constant pressure was in this case maintained by the Parrinello-Rahman barostat, 9 always with a time constant of 2 ps.Coupling to the barostat was still semi-isotropic, as described in the main text.Neighboring atoms continued to be searched for every 20 steps, but for this stage only, cutoffs for neighbor searching were temporarily shortened to 8 Å.All other conditions present in the previous equilibration stage were retained.

Input, Matrix Derivation, and Further Details on SPM
All input generation discussed in this subsection is carried out starting from our 5 µs metatrjectory.
All steps are directly performed with the cpptraj tool, 10 whose input for each step we provide electronically.

Alignment and Generation of Average Structure for Clustering
Prior to clustering, our K-Ras4B metatrajectory is processed as follows: • All frames are aligned to the very first metatrajectory frame (originally from replica 1), based on minimizing the root-mean-square deviation (RMSD) of Cα atoms.Cα atoms of HVR residues are included in this and subsequent alignments.
• The resulting Cα-aligned metatrajectory is used to derive an average structure ("average" command in cpptraj). 10The Cα-aligned metatrajectory is realigned to Cα atoms of the average structure.

Clustering
Clustering is performed on every 50 th frame of the realigned metatrajectory, or more specifically, after retaining every 25 th frame (one every 50 ps), and sieving every second frame from this pruned metatrajectory to perform clustering.The algorithm used is the average-linkage hierarchical agglomerative approach as implemented in cpptraj. 10The criterion on which to cluster was the RMSD of Cα atoms across the pruned metatrajectory, and the clustering run was made to stop once five clusters were found.The representative structure for the most populated cluster (lowest distance to centroid) was used for the final realignment of the (non-pruned) metatrajectory (see next subsection).

Calculation of Average Distance and Correlation Matrices
Alignment of each frame of the original metatrajectory to the representative structure of the most populated cluster was again carried out based on RMSD of Cα atoms, after which, the "matrix" command in cpptraj 10 is used twice in succession to produce both the distance matrix and the pairwise correlation matrix (see input file provided electronically for syntax; we also provide the matrices themselves).As per formulae provided in the main text, DynaComm.py 11then uses these matrices directly (together with a chosen structure of the protein at hand) to work out which node-node edges should be represented in the pre-SPM graph (i.e., those linking residues closer than 6 Å), and how heavy these edges should be.

Full Details on SPM Branches I-V
We here describe the paths of SPM branches I-V picking them up as they eradiate from the central allosteric hub.An overview of Branch V and its subbranch V.3 is provided in the main text.
Branch I (Figure 3; bottom) returns to Glu162 from Pro110 and conveys allosteric signals along almost all of helix α5, incidentally avoiding the flexible Asp153 identified by DF analysis.Branch II (Figure 3; bottom) reverberates to sheet β6, through which it travels in its entirety before reaching Ser145 and Ala146 on the β6-α5 loop and, therefore, the binding site (guanine moiety) and further jumping across to Leu19 on helix α1.Branch III (Figure 3; bottom) terminates abruptly at Ala134 in the C-terminus of α4, after crossing to the α4-β6 loop.Branch IV (Figure 3; bottom) departs Pro110 and moves back towards the N-terminus along the α3-β5 loop, dissipating at Ser106.
As stated, Branch V branches off to β4 from the hub on β5, with a dual coupling from Pro110 (β5) to Gly77 (β4) and from Met111 to Phe78 (β4); from there it further branches out into three subbranches V.1, V.2, and V.3.V.1 crosses almost the entirety of sheet β1 and V.2 just reaches the C-terminal Arg73 on α2, just outside of switch II.V.3 is discussed in the main text.

List of Allosterically Disruptive Mutations Appearing on the SPM
The 6 allosteric sites in the SPM that appear amongst the 8 reported by Weng et al. 12 include Val7 (β1), Thr58, Ala59 (switch II), and-on our allosteric hub and unsurprisingly with the highest degrees of "closeness"-Pro110 (α3-β5), Phe141 (β6), and Ile163 (α5).A seventh site, Asp54 on β3, while absent from the SPM, interacts electrostatically with SPM node Lys5 (β1) (and is thus shown in pink in Figure 7).Gly10, at the N-terminus of the P-loop, is the only relevant allosteric site that is neither part of the SPM nor in contact with one of its residues.
Further D-NEMD Details

Isolation of "Reactive" Frames from Equilibrium MD simulations
Our MD metatrajectory comprises a total of 100000 frames saved with stored atomic velocities.20 of these are automatically unviable for the D-NEMD simulations, as they are saved at the very end of their respective equilibrium replica: D-NEMD simulations begun from these frames would have no equilibrium counterpart against which to monitor Cα deviation.Of the remaining 99980 frames (4999 per replica), we omit the first 100 frames of each replica (further lowering the total of viable frames down to 4899).We furthermore only begin D-NEMD simulations from those 71706 frames that, based chemical considerations from the previous EVB study by Calixto and coworkers, 13 meet certain criteria.Frames recognized as "reactive" come from across all 20 MD replicas and are tabulated in Table S1.
Criteria that should simultaneously be met by an equilibrium MD frame to qualify as "reactive" are as follows: 1.That there exists a water molecule (the nucleophile), whose oxygen should be: a. no farther than 3.70 Å from the Pγ atom in GTP; and b. poised to attack GTP with an angle of >145° with respect to the Pγ-O3β bond.
2. That at least one of the three Hζ hydrogens in Lys16-which is important 13 to contrast the accumulation of negative charge in the transition state during GTP hydrolysis-is no farther than 2.19 Å from at least one of the Oγ (terminal) atoms of GTP.
In addition, optionally (see main text; Results and Discussion), we opted to recalculate D-NEMD statistics with a more stringent (and chemically plausible) criterion to determine reactive poses: 3. That the Cδ atom in Gln61, which under the right conformational conditions helps position the nucleophile for attack, 13 is no farther than 4.45 Å from one of the two hydrogens in the nucleophilic water molecule.
Since, in the absence of a GAP protein, Gln61 is conformationally freer, the inclusion of the third criterion greatly reduces the number of reactive poses (cf.Table S1), while not affecting the final results (data not shown).All of the (rather lax) thresholds in criteria 1-3 above were decided based on radial distribution functions involving the atoms in question (or histogram, in the case of the nucleophilic attack angle), choosing them to include the entire first peak (data not shown).
Table S1.Summary of the total frames (windows) on which D-NEMD simulations were conducted, and those that survived for their whole 50 ps duration.Our main objective when automatically introducing the perturbation at the start of each D-NEMD window was to instantaneously reproduce the GTP 4-+ H 2 O → GDP 3-+ [H 2 PO 4 ] -reaction as closely as possible by changing the positions of only 4 atoms: Pγ in GTP; and the three atoms composing the nucleophilic water molecule.In this reaction, one of the γ-oxygens of GTP is acting as the base that deprotonates the nucleophilic water. 13The beginning and outcome of a typical reconstruction are illustrated in Figure S3.where  ⃗ was defined in 3b, and  %%%%%%⃗ $%& and  %%%%%%⃗ !"# denote the Owat → Hwat1 before and after rotation, respectively.This operation only yields the correct Hwat1-Owat-Pγ angle; lengths and dihedrals are fixed in the next substeps.

Replica
e. Shortening the new Owat → Hwat1 vector to its desired equilibrium length of 0.974 Å 14 (by normalizing it to a unit vector and multiplying the latter by 0.974 Å).
f. Measurement of the old dihedral Hwat1-Owat-Pγ-Oγ base dihedral.Requires finding the normal vectors to the planes Hwat1-Owat-Pγ and Owat-Pγ-Oγ base , and taking the arccosine of their dot product.
g. Rotation of the Hwat1-Owat-Pγ-Oγ base dihedral about the Owat-Pγ bond, from the old dihedral angle to the prescribed 180°, 14 using the same formula in 3d.
h. Establishment of the final "new" coordinates of Hwat1 after the rotation.
4. Displacement of the remaining hydrogen on nucleophilic water (Hwat2)-now the only one remaining in its old position-to Oγ base , again at the prescribed 14 geometry for [H 2 PO 4 ] -.This is done following precisely the same substeps as in 3, except that the bond to regulate is now Oγ base -H wat2 ; the angle is now Hwat2-Oγbase-Pγ; and the dihedral is now Hwat2-Oγbase-Pγ-O wat .
The [H 2 PO 4 ] -anion resulting from this operation-at equilibrium geometry 14 -is shown in Figure S3 bottom.Note that positions of O1γ, O2γ, O3γ, and O3β have remained unchanged, and only four atoms have moved.

Figure
Figure S1.(a)Starting structure and periodic cell for simulations, set up as discussed in Computational Methods.The G domain (residues 2-166; yellow); GTP (sticks); and Mg 2+ (dark green sphere) are modeled after PDB ID: 6vjj.1 The hypervariable region (HVR; residues 167-185; salmon) and the POPS/POPC phospholipid bilayer (lines) in which it is embedded through farnesylated Cys185 (C as large salmon spheres) are modeled from previous simulations.2Solute ions Na + and Cl -are rendered as purple and pale green spheres, respectively.(b) View from a different angle to reveal the location of the K-Ras4B-GAP interface (GAP in pale green; PDB ID: 1wq1),3 showing all salient regions of the former's G-domain with the same color code as Figure1.A catalytically crucial arginine administered by GAP (Arg789) is labeled and rendered as sticks and shown alongside Lys16 and Gln61; Na + and Cl -are omitted.(c) View from yet another angle to reveal the location of the K-Ras4B-RAF1 effector interface (cysteine-rich domain of RAF1; pale blue; residues rendered as lines; PDB ID: 6xi7).1 Color code otherwise identical to Figure1.Asp153 on helix α5 of K-Ras4B and other representative residues common to several effector interfaces are also rendered as sticks and/or labeled.Na + and Cl -are omitted.In all panels, H and solvent are omitted for clarity.Color code for explicit non-C atoms/ions: Mg 2+ : dark green; Cl: pale green; Na: purple; P: orange; O: red; N: blue; S: yellow.

Figure S2 .
Figure S2.Overview of D-NEMD simulations setup, exemplified for equilibrium MD replica 1 and repeated for the remaining 19 replicas.Barring an initial 5 ns period (labeled "Equil.";pale orange box), poses saved with velocities every 50 ps in the 5 to 245.95 ns interval (thick black lines) are assessed based on loose "reactive" (R) or "unreactive" (U) character, based on the vicinity of the nucleophilic water to GTP:Pγ.From each "R" pose, a new D-NEMD window is initiated after instant hydrolysis of GTP (see main text and Supporting Information).After perturbation (6 ps; yellow boxes), windows developing unviable product geometry (u) are discarded, while D-NEMD production (44 ps; green boxes) is initiated for viable geometries (v).Deviation of Cα atoms between equilibrium and non-equilibrium MD at equivalent points in time is then monitored across each window pair, and averaged across all windows and replicas to obtain final per-residue values at regular intervals over 50 ps.

Figure S3 . 2 .
Figure S3.Example of the transformation (perturbation) performed at the beginning of each D-NEMD run on the corresponding "parent" pose issuing from equilibrium MD, whereby GTP 4-and a nucleophilic water (top) are converted to GDP 3-+ [H2PO4] -(bottom) by only moving four atoms.The resulting [H2PO4] -(bottom) is constructed at equilibrium geometry.Color codes are identical to Figure 1b, except that the salient G-domain regions are left in yellow.All solvent molecules apart from the nucleophilic water have been omitted.Using an in-house script, and relying on the NumPy module in Python, the steps taken to enact Reconstruction of GDP and [H 2 PO 4 ] -from H 2 O and GTP